Intro to Visualization in Python - Exploratory data analysis - 1

One should look for what is and not what he thinks should be. (Albert Einstein)

Exploratory Data Analysis: Topic introduction

In this part of the course, we will cover the following concepts:

  • Exploratory data analysis use cases
  • Perform EDA on data

Module completion checklist

Objective Complete
Discuss data visualization and exploratory data analysis
Describe chart types by data and form

The Challenger explosion example

  • The 1986 Space Shuttle Challenger explosion is an emblematic case study of how data visualization can play an essential role in decision-making


  • The explosion happened due to low temperatures that affected shuttle parts


  • Edward Tufte, a visualization expert, argues that the cause of this tragedy was an unreadable format of data given to decision-makers

centered

The original visualization of the temperature

  • The chart below was presented to the experts at the time


  • How easily interpretable do you think it is?

centered

The revised visualization of the temperature

  • Edward Tufte argues a better chart may have prevented disaster


  • How easily interpretable do you think the revision he created is?

centered

Data Visualization

  • Data visualization is an attempt to make data more easily digestible by rendering it in a visual context (e.g., charting, graphing, etc.)


  • We use data visualization to transform raw data into something compelling


  • Data visualization is at the intersection of art and science

centered

Why visualize data?

  • Visual context provides insights on patterns, trends, and correlations that might be difficult to detect otherwise


  • It is a simple way to convey concepts and provide visual access to large amounts of complex data


  • Using Python is excellent as it has multiple graphing libraries with many valuable features

centered

Why build a visualization?

  • To provide valuable, interpretable, and relevant insights


  • To give a visual or graphical representation of data / concepts


  • To communicate ideas


  • To provide an accessible way to see and understand trends, outliers, and patterns in data


  • To try to confirm a hypothesis

centered

Chat Activity: Explore a Dashboard

  • What is a dashboard? It is a visual display of all your data
  • Let’s assume you work at a recruitment firm, and the firm has a dashboard to track and view its KPIs (Key Performance Indicators)
  • What KPIs would you like to track using that dashboard to help make better business decisions?
  • Share your thoughts in chat

centered

Explore a dashboard

  • Take a couple minutes to explore the dashboard which is designed to answer various questions or user queries about the recruitment department:

https://share.geckoboard.com/dashboards/JPVRGXEPTTUBIX4Y?_ga=2.198515816.631707739.1656352474-1254336470.1656352474

Exploratory data analysis (EDA)

  • Exploratory data analysis (EDA) is the process of reviewing new data to discover patterns, spot anomalies, test hypotheses, and check assumptions


  • It helps to create graphs without breaking the train of thought as you explore your data


  • Visualization is an iterative process and consists of a few steps:
    • Analyze
    • Manipulate
    • Graph
    • Repeat

centered

Exploratory data analysis in Python

  • Python is a powerful tool for EDA because the graphics tie in with the functions used to analyze data


  • What is possible using Python?
    • Visualization tools available through multitudes of packages (e.g. matplotlib, seaborn)
    • The visualizations created are high quality graphics that can be saved as SVG, PNG, JPEG, BMP, PDF
    • Visualizations are often the best way to display patterns in data for printed publications


  • Further, we will explore how to visualize data using Python and perform exploratory data analysis to understand and detect the patterns

Module completion checklist

Objective Complete
Discuss data visualization and exploratory data analysis

Describe chart types by data and form

Getting started with data viz

  • Deciding on what visualization type to use will depend on the data and message you want to communicate


  • Common data types include:

    • Categorical
    • Univariate
    • Bivariate
    • Time-based (trending)
    • Text
    • Geospatial

Categorical Data

  • Categorical data is non-numeric or qualitative


  • Insight: comparisons and proportions


  • Chart types: vertical bar, column bar, horizontal bar, pie, bullet charts, stacked bar, and tree maps

centered

Univariate Data

  • Univariate data consists of a single numeric variable


  • Insight: distributions, proportions, and frequencies


  • Chart types: histogram, density, box plots
    • Is the data normally distributed?
    • Are there any outliers?
    • Do you notice any other patterns in the data?
    • These are some of the steps for initial data exploration

centered

Bivariate Data

  • Bivariate data consists of two (or more) numeric variables (i.e., weight and height)


  • Insight: relationships, correlation, proportions, and frequencies


  • Chart types: scatterplot, bubble, parallel, radar, bullet, and heat

centered

Trend Data

  • Trend data includes a time-based data (i.e., years, months, days, hours, etc.)


  • Insight: trends, comparisons, and cycles


  • Chart types: line, area, bubble, vertical bar

centered

Text Data

  • Text data includes alphanumeric single words or phrases (keywords)


  • Insight: sentiment, comparisons, and frequency


  • Chart types: word cloud, histogram, stacked bar chart

centered

Geospatial Data

  • Geospatial data includes qualitative or quantitative information about specific locations


  • Insight: locations, comparisons, and trends


  • Chart types: chloropleth filled map, point map, connection map, isopleth map

centered

Common visualizations

  • Let’s review when to use some of the common visualizations, including:

    • Tables
    • Bar charts
    • Line charts
    • Area charts
    • Heatmaps
    • Scatterplots

Simple text or table

  • Simple text is used when there is just a number or two to share. Simple text can be a great way to communicate something like:

    • 440 employees worked a total of 31,702 days, an average of 72.05 days per employee


  • Tables are helpful when communicating to a mixed audience or showing a few different units of measure

centered

Bar Chart

  • Bar charts are used to express larger variations in data and how individual data points relate to a whole, comparisons, and ranking


  • They express quantities through a bar’s length, using a common baseline (=zero)


  • Note: when the data has lengthy names, using a horizontal bar chart will make the data easier to read

centered

Line Chart

  • Line charts are used to plot continuous data in some unit of time, such as days, months, quarters or years


  • They can also be used to show multiple series of data


  • A line graph can also represent a summary statistic, like the average and confidence level range or the point estimate of a forecast

centered

Area Chart

  • Area charts are used to summarize relationships between datasets, how individual data points relate to a whole


  • The visual at the right shows the monthly trend of active operations


  • In chat, share your thoughts on how you think this visual could be improved

centered

Heatmap

  • Heatmaps visualize data in tabular format, using colored cells to show the relative magnitude of the numbers


  • When using a heatmap, it is helpful to restrict the number of different color gradations


  • The visual at the right shows the busiest months ranked by the number of operations for each department

centered

Scatterplot

  • Scatterplots show the type of relationship between two numeric variables


  • Scatterplots are often used in scientific fields and are sometimes viewed as “complicated” to understand, but there are real-world uses as well


  • In chat, share your thoughts on what relationship this scatterplot represents

centered

Review Quiz: Data Visualization

  • Question 1: Which graph of the two makes it easy to determine what investment has a more significant market share?
  • Share your answer in the chat

centered

Review Quiz: Data Visualization

  • Question 2: What is this graph called?
  • Share your answer in the chat
    centered

Review Quiz: Data Visualization

  • Question 3: Which graph represents the values accurately?Why do you think so?
  • Share your answer in the chat

centered

Review Quiz: Data Visualization

  • Question 4: What type of graph is this?
  • Share your answer in the chat
    centered

Review Quiz: Data Visualization

  • Question 5: Which of the following graphs focus on trends rather than individual values?
  • Share your answer in the chat

centered

Knowledge check

centered

Link: Click here to complete the knowledge check

Module completion checklist

Objective Complete
Discuss data visualization and exploratory data analysis

Describe chart types by data and form

Congratulations on completing this module!

icon-left-bottom